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Abstract 

In recent years, spatial and spatio-temporal modeling have become an important area 
of research in many fields (epidemiology, environmental studies, disease mapping). In 
this work we propose different spatial models to study hospital recruitment, including 
some potentially explicative variables. Interest is on the distribution per geographical 
unit of the ratio between the number of patients living in this geographical unit i, say 
yi, and the population, Ni in the same unit . Models considered are w ithin the frame- 
work of Bayesian Latent Gaussian models (|Fahrmeir and Tutj . l200lh . Our response 
variable yi is assumed to follow a binomial distribution, with logit link, whose param- 
eters are the population Ni in the geographical unit i and the corresponding relative 
risk 7Tj. The structured additive predictor r\i accounts for effects of various covariates 
in an additive way: r/i = a + Y^ii + Hk=i PkZki + U- Here, the / (j) (-)s 

are unknown functions of the covariates u (which also includes a spatial effect) the 
/3fcS represent the linear effect of covariates z and the as are unstructured terms. To 
approximate posterior marginals, which not availabl e in clo sed form, we use integrated 
nested Laplace approximations (INLA) (|Rue et all 12005^ ). recently proposed for ap- 
proximate Bayesian inference in latent Gaussian models. INLA has the advantage of 
giving very accurate approximations and being faster than McMC methods when the 
number of parameters does n ot exceed 6 (as it is in our case). Model comparisons are 
assessed using DIC criterion (|Spiegelhalter et alll2002ft . 



1 Introduction 

Analysis of spatial hospital utilization patterns is a fundamental requirement 
for effective health services planning and hospital management. Two different 
approaches are used in these studies: 

(a) The descriptive approach consists essentially in recruitment mapping. 
Maps can be produced for different population groups, for example dis- 
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tinguishing males and females or age categories. The criteria for these 
distinctions are clinical and not statistical-based. Moreover, the maps 
need a spatial smoothing in order to be more easily interpretable. 

(b) The explicative approach, the aim of which is to find what variable(s) can 
explain differences in the observed recruitment. Different statistical mod- 
els are constructed and compared in order to highlight these variables. 
Furthermore, models can be used for simulating the effect on the recruit- 
ment of the modification of some of these variables (for example what 
happens if a new road decreases the access time to a certain hospital from 
some cities of a region?). Models can also be used to take into account 
population structure evolution on the predictions of the recruitment. 

The study of the recruitment, for instance, for a particular hospital, a given 
disease, or a particular age range, implies having a geographic reference of pa- 
tient residence. Identifying the place of residence of patients may allow geo cod- 
i ng an d subsequent use of geostatistical models for point processes (see Cressid 



(1993)). The problem arises of determining the population at risk to be matched 



to each patient or patient group. Furthermore, the reliability of the exact ad- 
dress is not assured. Models for grouped data applied to geographic units are 
then meaningless. The address of each patient is reported to a geographical 
unit, in which a population at risk can be determined. Different possibilities for 
defining geographical units can be explored. The most detailed level available in 
years is the municipality (" commune" , defined by the French National Institute 
for Statistics and Economic Studies, INSEE). The French National Geographic 
Institute (IGN) calculates the " ce ntroid" of each municipality, that is to say vir- 
tual centers taking into account the shape of the municipality. These centroids 
can also be used to locate recruited cases. The municipality of residence is a 
dataset item that is always present in the hospital information systems, proba- 
bly even updated if necessary at each patient visit. In the rest of this paper we 
will retain the municipal level but generally we will speak of the geographical 
unit. 

Concerning the hospital recruitment, the interest is on the distribution per 
geographical unit of the ratio between the number of patients living in this 
geographical unit i, say j/j, and the population, number of persons "at risk" 
to visit an healthcare provider, N% in the same unit. We call this ratio, 
the standardized recruitment ratio (SRR). We assume that the response vari- 
able Tji independently follows a binomial distribution whose parameters are the 
population Ni and a particular risk per unit 7Tj. 

If logit(7Tj) = rji, we then have 7Tj = jf . The covariates enter the model 
additively through the predictor rji, 



logit(TTi) = 7^ = /! + f^iUji) + ^ PkZki 

j=l k=l 



Here, the / <J -'(-)s are unknown functions of n/ covariates in u (including also 
a spatial effect), the /3fcS represent the linear effect of np covariates in z and 
the €iS are unstructured terms. T he model adopted is a str uctured additive 



regression model (StAR model), see iFahrmeir and Tuta ([20011 ) . In this model, 



the response variable yi is assumed to belong to an exponential family, where 



2 Data description and explanatory analysis 



the mean fa is linked to a structured additive predictor rji through a link func- 
tion <?(•), so that g{fa) — rji. The structured additive predictor 77, accounts for 
effects of various covariates in an additive way. This class of models can be 
complex and hierarchical, involving fixed and random e ffects and are particu - 



1995t iBaneriee et all l2003) . 



larly suited to Bayesian inference (jGelman et al. 
although in this context, the term "fixed" no longer has the classical meaning 
it has under purely frequentist inference. Our aim is to use the model above 
to explain spatial recruitment in Haute Alsace, a region in the north-east of 
France, using data from the public hospital of Mulhouse, the biggest town of 
the region. Several alternative explic ative variables are conside r ed. As is often 
found in disease mapping literature ( Bernardinelli et al. . 19951 Rue and Heidi . 



20051 ) we have chosen Gaussian priors for /i, /(•), j3 and e. Our model is then 
a latent Gaussian model and parameters x = (fi, /(•), (3, e) are called latent 
Gaussian variables. Hyperparameters 6 involved in prior elicitations are not 
necessarily Gaussian. The common approach to inference for latent Gaussian 
models is Markov chain Monte Carlo (McMC) sampling. It is well known, how- 
ever, that McMC methods tend to exhibit poor performance when applied to 
such models. Various factors explain this. First, the components of the latent 
field x are strongly dependent on each other. Second, 6 and x are also strongly 
dependent, especially when n is large . Desp ite developments (see for instance 



dependent, especially wnen n is large . Uesp ite developments (see lor instance 
Baneriee et ail (|2008h : iHeld and Ruel (<2010h ) for overcoming this poor perfor- 



mance, McMC sampling remains painfully slow from the end user's point of 
view. To approximate posterior marginals, we use integrated nest ed Laplace ap- 
proximations (INLA) ([Rue et all 120091 iRue and Martinol . 120071 ). recently pro- 
posed for approximate Bayesian inference in latent Gaussian models. INLA has 
the advantage of giving very accurate approximations and being faster than 
McMC methods when the number of parameters does not exceed 6 (as it is 
in our case). Model comparison and selection will be asse ssed using Deviance 
information criterion (DIC), see ISpiegelhalter et al.l (|2002l ). Implementation of 
sp ace and space-time models with INLA are presented and explained in detail 
in lSchrodler and Held! (l2009allbT ) . 



The structure of the paper is as follows: in Section 2 we describe the data in 
detail. In Section 3 we introduce and justify the model used including details on 
assumptions on the priors and on the INLA method used for inference. Results 
obtained and comparisons of different models are shown in Section 4. We finish 
with a discussion in Section 5. 



2 Data description and explanatory analysis 

Data are from the public hospital of Mulhouse (its location is shown on Figure 
(JT|)), the biggest town of the Haute Alsace region in north-east of France. This 
region, adjacent to Germany and Switzerland, is 3,525 km 2 and has 756,974 
inhabitants (01/01/2010) in a very dense irregular lattice of 377 municipalities 
("communes") which are the geographical units we use. The largest distance 
between the centroids of two geographical units is about 95 km. 
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Fig. 1: Proximity zones of the region (Zl: Altkirch, Z2: Colmar, Z3: 
Guebwiller, Z4: Saint-Louis, Z5: Selestat, Z6: Thann, Z7: Mul- 
house). The studied healthcare provider is HCP1 and the sec- 
ond provider is HCP2 

In 2008, all the 12 healthcare providers in the region recorded 182,487 visits 
(in- and outpatients). The hospital of Mulhouse recorded 48,747 among these 
(27%). In this paper, we only consider the 33,682 inpatients. The distribution 
of the number of cases across the 377 geographical units is very heterogeneous: 
there are between and 12,330 cases per geographical unit, the mean is 89 cases 
but the median is 17 with 99% of the geographical units having less than 1,019 
inpatients. Only 4 geographical units had more than 1,000 inpatients. Figure 
@ represents the observed recruitment ratio per geographical unit (calculated 
as the number of inpatients which in a given geographical unit divided by the 
population of this geographical unit). 
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Fig. 2: Observed recruitment ratio per geographical unit 

We have different potential explicative variables affecting recruitment. More 
precisely we have to deal with the following requirements: 

(a) Practitioners send their patients preferentially (except for some particu- 
lar pathologies) to a given healthcare provider. These practitioners fil- 
ter patients on several closed geographical units. This means that the 
recruitment in a given geographical unit is more "similar" to that in a 
closed unit than that in another random unit in the region. This is the 
definition of spatial autocorrelation and we can assume that the use of sta- 
tistical models taking into account the autocorrelation greatly improves 
the explanation of the recruitment. 

(b) The distance or the access time between the healthcare provider and the 
geographical unit of residence reflects the case of access to this healthcare 
provider. The access time may have a greater influence on recruitment in 
the context of emergency. Specifically, interest is focused on measuring 
the attenuation of recruitment with distance. Random walks can be used 
to smooth this attenuation. 

(c) A recent French healthcare policy introduced the notion of "proximity 
zones" . The region is divided into several of these zones, shown on Figure 
(JT|), each centered by a healthcare provider to which its patients are re- 
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cruitcd. But there are different levels of providers according to their tech- 
nical capacities and competencies. A bigger provider also has to recruit 
patients (for various specific pathologies) into several of these subregions. 

(d) Some other covariates can also influence the recruitment, such as age, 
geographical characteristics or economic status of the geographical units, 
etc. Most of these covariates are beyond the topic of this paper. Herein, 
we test only two: 

(i) The distance between each geographical unit and a second important 
healthcare provider (HCP2 on Figure ([T])) assuming that patients 
living nearer this second provider will prefer to go there rather than 
the first. 

(ii) The density of practitioners in each geographical unit (for 1,000 in- 
habitants), assuming that a higher density will result in a higher 
recruitment. 

This consideration lead us to consider Bayesian stru ctured additive regres- 



sion models (StAR models) ([Fahrmeir and Tuta . 120011) . We will present the 



adopted model in detail in the following section. 



3 Statistical Model 

We assume that the response variable yi, the number of observed cases in the ith 
geographical unit (i — 1, •, 377) follows a binomial distribution with parameters, 
Ni and 71",-, where Ni indicates the population and 7T; is the relative risk. Thus 
Hi ~ Bin(JVj, 7Tj). We consider the logit link and the following additive structure 
for the linear predictor: 

n } rip 
logitfo) = m = M + f [a) M + + fi S) + ti U) - (!) 

a=l k=l 

Here, the / (a) (-)s are unknown functions of the covariates it, the /3fcS represents 
the linear effect of covariates z, V- 8 ' is a spatially structured component and r"-* 
is a spatially unstructured component. The unstructured spatial component 
can be used as a proxy for important environmental covariates not included in 
the analysis. 

We assume the following prior distributions: 

• f^ a ' follows an intrinsic second-order random walk model with precision 




In addition, on and / 2 are specified vague priors (for example uni- 
form) . 

• The model for the spatial structured component is an intrin- 
sic conditional autoregressive process (or Markov Gaussian random 
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field) (|Besag et al.l . ll99lUMolliel.ll996l) . ICAR, which assumes that, con- 



ditionally on the spatial effect across adjacent geographic units, the effect 
in a unit follows a normal distribution. The average of this distribution is 
the average of spatial effects in the surrounding units and its variance is 
proportional to the number of neighbors of this unit. If /j is the effect in 
the unit i and f_; the effects in units other than i of the study area, then 
the ICAR can be written: 



In this formula, n, is the number of units adjacent to each i and di repre- 
sents the set of all of these adjacent units. The adjacency between units 
is most often defined according to the notion of common boundary i.e. 
are considered as adjacent if two units share a common border. The only 
parameter to estimate is then t^ s \ the precision parameter of the ICAR. 
In this model, one may ask whether any spatial effect of the data is taken 
into account by the ICAR. We thus can seek to distribute the residuals on 
each of the geographical units. The association of this residual "hetero- 
geneity" and the autocorrelation is the model traditionally used in disease 
mapping risks and called " convolution prior" involving an intrinsic condi- 
tional autoregressive process (for autocorrelation) and a normal distribu- 
tion by geographical unit (for heterogen eity). Then are independen t 
zero-mean Gaussian with precision (jBesag et al.l . Il99lk IrVlollia . Il996h . 

We will assign independent L(0. 001, 0.001) priors to the hyperparameters 
(7-W, t( s \ t^) t and a A/"(0, 0.01) prior to [i and to Latent Gaussian models 
are a subset of Bayesian additive models with a structured additive predictor, 
in which Gaussian priors are assigned to /z, all /(•), /3. Let x be the vector of 
all the n Gaussian variables fj,, /(•) and /3 and the vector of hyperparame- 
ters, which are not necessarily Gaussian. The main goal of a Bayesian inference 
method is to estimate the posterior distribution 

n(xi\y) = J ir(xi\0,y)Tr(e\y)de. (2) 

We present the INLA approach for approximating the posterior marginals of 
ft( x i\y)i * = 1, ■ • • ,n. The approximation is computed in three steps. The first 
step approximates the posterior marginal of 6 by using the Laplace approxima- 
tion. The second step computes the Laplace approximation or the simplified 
Laplace approximation of n(xi\y,0) for selected values of 9. The third step 
combines the previous two steps and uses numerical integration for retrieving 
the final estimate of equation @. 

First step The marginal posterior density n(8\y) of the hyperparameters in 
@ , is approximated in the following way: 



~ ta \ \ K{x,6\y) 

n0|y) oc — 



7TG(x|0,y) 



x=x*(6») 



where 7rG( x l^,y) is the Gaussian approximation of 7r(x|0,y) and x*(0) is 
the mode of 7f(x|0, y). 
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The main use of 7r(0|y) is to integrate out the uncertainty with respect 
to 6 when approximating the posterior marginal of xi. For this task it is 
sufficient to be able to select good evaluation points for the numerical in- 
tegration. We locate the mode of 7f(0|y), by optimizing log (7r(0|y)) with 
respect to 6. This can be done by using some quasi-Newton method. Let 
0* be the modal configuration, at 8* we compute the negative Hessian 
matrix H, using finite differences. Let £ = H^ 1 , which would be the 
covariance matrix for if the density were Gaussian. To help the explo- 
ration, we use standardized variables z instead of 6. Let £ = VAV T be 
the eigendecomposition of £, and define 8 via z: 

0(z) = 8* + VA^z. 

We explore log (7r(<?|y)) by using z-parameterization. We start from the 
mode (z = 0) and go in the positive direction of z\ with step length d z , 
say 8 Z = 1, as long as 



log(7f(0(O)|y))-log(7f(0(z)|y)) < 



(3) 



where for example, 8^ — 2.5. Then we switch direction and do similarly. 
The other coordinates are treated in the same way. Posterior marginals 
for 0j can be obtained from Tr(0\y) by using numerical integration but 
this is computationally demanding. Then we use the points that satisfy 
the equation ([3]) to construct an interpolant to log (7r(#|y)) and compute 
marginals by using numerical integration from this interpolant. 

Second step We have now a set of weighted points {Ok} and have to find an 
accurate approximation for the posterior marginal for the XiS, conditioned 
these selected values of 6. The density Tr(xi\6,y) is approximated using 
the Laplace approximation defined by: 



ir LA (xi\6,y) cx - 



7r(x,0,y) 



7r GG (x_j|a; i ,0,y) 



(4) 



where x_i denotes the vector x with the ith component omitted, 
^Gc{ yi -i\xi 1 8, y) is the Gaussian approximation of 7r(x_i|xi, 6, y) and 
x^(:Ei,0) is the mode of 7r(x_j|:ri, 0, y). To obtained a simplified ver- 
sion of such a Laplace approximation TTsLA(xi\6, y), which is defined as 
the series expansion of TTLA(xi\6,y) around Xi — fJ.i(9), it is necessary to 
approximate the mode in the following way: 



xliixiiO) w E #G (x_ i |a; l ) 



(5) 



The conditional expectation ([5]) for Gaussian variables implies the follow- 
ing identity: 

(6) 



- a ij(V) 



a ,{6) 

for some dij(9) when j ^ i. Denote 



<Ti(0) 



( s ) x t - Hi{0) 

f. = 
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Define then the following quantity, that we suppose exists: 



df> (xi , 6) = ^-3 log n(yj \xj , 6) 



The numerator and the denominator of expression Q can be expanded 
around Xi = fJ.i(6) using the approximation ([5]) and the following lemma: 

Lemma 1: Let x = (xi, • • • , x n ) T ~ Af(0, S); then for all Xi 

- i(x„E(x_,| a ; 1 ) T )S- 1 ( 3; „E(x_ 1 |i,)) = (7) 

Approximating up to third order, we obtain for the log of the numerator 
of expression 



+ \( X *' ) T E 4 3 W),0){^(0Hi(0)} 3 + ---- (8) 



where j 6l\i can take all values between 1 and n, except i. For the log 
of the denominator, we obtain: 

log7T G G(x_i|a;i,0,y) = constant- 

I x_i— E^- G (x.—i\xi) 

- \^ ] E var^^liiJdf^W.fl)^^)^-^) + • • • , (9) 

jei\i 

where 

i>ar> G (xj ;|xj) = a 2 {l — corrji G (xi,Xj) 2 } . 
Define the following quantities: 

= \ E ^^(x.-ix^df^^),©)^^)^,-^) (io) 
7 J 8) (e) = £ ^ 8 W),^(0Hi(0)} 3 (11) 

then replacing JTU]) in © and (JTTJl 

in (J5J) we obtain 

log-(x,0,y) I(^)) 2 + I(^)) 3 7 f) W + ... (12) 



and 



log7T GG (x_ 4 |x i ,0,y) = constant — x- '7] '(9) H (13) 

X_a = -E^ G (x_a|a:a) 



Finally we have: 

logfsz^xp^y) = log7r(x,0,y) - log 7r GG (x_ i |x i , 0, y) =(14) 
= constant - i (x«) V 7 «(0)x| s) + i (x^)* #(0) + • ■ • . 
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Third step This step combines the previous two steps with numerical integra- 
tion: 

Hx t \y) =J2 jt SLA(x i \9 k ,y)Tr(0 k \y)A k . (15) 

k 

The sum is over values of defined in the first step with area weights 

Besides the explicative model ([1} we also include a descriptive model in the anal- 
ysis which takes into account only spatial autocorrelation: logit(TTj) = fi + 
or assume "convolution prior" for the spatial components: logit(TTj) = (i + f^ + 
f^. In the best of the two previous models, we add in several explicative mod- 
els different potential effects, in f- a ': distance or access time to the healthcare 
provider, distance between geographical unit of residence and the second health- 
care provider and medical density of the geographical unit. The proximity zones 
are added using indicator variables (the zone where the healthcare provider un- 
der study is, is used as the reference zone): Pk&(i £ Zk) where l(i G Z k ) = 1 
only if the geographical unit i belongs to the proximity zone Z k . The INLA R 
package is used for implementing models. The goodness of fit of each model 



is ass essed using the deviance information criterion (DIC) ( Spi eeelhalter et al 



20021 ). as a generalization of the Aka'fke score. We also use DIC for comparing 
models. The DIC is defined as DIC — D +pd, decomposed like penalized like- 
lihood indicators into two terms: D measuring the fit to data and po measuring 
the complexity of the models. 



4 Results 



Each model needs less than 30 seconds in R, on a PC with a 2.29 GHz dual core 
processor (compared to 2 or 3 hours using fully Bayesian inference). 



4.1 Descriptive models 

We compare the ICAR-only model with the convolution prior model. The DIC 
of the former is 2253.6 (for an effective number of parameters of 230.6) and the 
DIC of the later is 2253.5 (and 231.5), indicating that the first model is good 
enough on the dataset. Figure ([3]) is the exponential of the ICAR spatial effect 
(readable as a relative risk). 
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0.23 




2.22 




3.15 




7.27 




Fig. 3: Exponential of the spatial effect (ICAR), descriptive model 



4.2 Explicative models 



Tab. 1: DIC of the different explicative models 



Model 


Pd 


DIC 


ICAR alone 


230.6 


2253.6 


ICAR and distance to provider 


224.7 


2254.3 


ICAR and access time to provider 


224.6 


2245.1 


ICAR and distance to the second provider 


203.5 


2241.1 


ICAR and proximity zone (as factor) 


196.7 


2234.5 


ICAR and medical density 


231.6 


2254.6 



Table ([l} summarizes the DICs of different models. The DIC indicates that in 
addition to spatial effect by an ICAR prior, taking into account the distance to 
provider (surprisingly) or practitioner density, does not improve the model. On 
the contrary, in addition to the ICAR prior, taking into account the access time 
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between the geographical unit of patient residence and the healthcare provider 
improves the model compared with the similar model with only the ICAR prior 
(DIC increases by 7.5). In Figure the relative risk decreases for time less 
than 20 minutes but remains greater than 1, then decreases again to a value 
below 1 after 40 minutes. 




Fig. 4: Exponential of the access time effect, explicative model 

The distance to the second provider also improves the initial model and 
Figure (|5|) shows the major concurrent effect of this second provider on the first 
one when patients live less than 20 kilometers away from it. 
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"1 1 1 1 1 1 1 1 

10 20 30 40 50 60 70 

Distance to the second healthcare provider 



Fig. 5: Exponential of the distance to second healthcare provider ef- 
fect, explicative model, (same legend as Figure (|4|)) 

Tabic (JTJ) shows, furthermore, that the proximity zone greatly improves the 
initial model. We use fixed effect and random effect for this covariate but as the 
results are slightly similar, only the results with fixed effect are shown. Table 
([2]) summarizes these results (with respect to the zone number 7 where the 
healthcare provider is situated). Figure ^ shows that, even adjusted to zone, 
an over-recruitment persists in the zone number 7 and an under-recruitment in 
the north (role of the second healthcare provider) but also in a small South-East 
sub-region. 



Tab. 2: Proximity zone (fixed effect) 

Posterior mean 95% confidence interval 



1 Altkirch 


0.83 


0.63 


1.07 


2 Colmar 


0.08 


0.05 


0.13 


3 Guebwiller 


0.39 


0.29 


0.52 


4 Saint-Louis 


1.07 


0.79 


1.44 


5 Selestat 


0.04 


0.01 


0.11 


6 Thann 


1.24 


0.92 


1.66 


7 Mulhouse 




Reference zone 





5 Discussion 



14 



<1.02 

[1.02-2.27) 
[2.27-3.52) 
>3.52 





Fig. 6: Exponential of the proximity zone adjusted spatial effect, ex- 
plicative model 

5 Discussion 



Introduction of covariates in the models we tested is straightforward but we must 
keep in mind that nothing is gained with the INLA technique if the number of 
these covariates is more than 6. If this is the case, the McMC technique remains 
the most useful technique. Furthermore, INLA relies on a latent Gaussian model 
and in order to smooth the effect of distance, we used a random walk. A power- 
ful alternative of random walks is to use splines ( Ruppert et all 2003 ). A mong 
different splines, linear combinations of B-splines (jEilers and MarxTl996l) have 
useful properties and offer a lot of flexibility but even a certain " wiggliness" . 
A solution is to penalize second derivative s or d ifferences on the coefficients 
of the linear combination (Eile rs and MarxLll996l) . We then obtain P-splincs. 
In the Bayesian framework, th e stochastic equival e nt of differences are random 
walks. For example, following lLang and Brezgerl ([20041 ) . if B are B-splines of 



order d, a P-spline is defined by ^2^=i PmB m , assuming k regularly spaced 
knots. Priors on /3 are then random walks of first or second order with gaus- 



5 Discussion 



15 



sian errors e m ~ N(0, ^-). Priors on fli and eventually /32 are flat (uniform 
distributions) and gamma prior is assumed on the precision r. But the es- 
timation of this kind of model using INLA is not possible. P-splines can be 



reform ulated as a latent Gaussian model (see for example ICrainiceanu et al 



(120051) 1 but there are as many parameters to be estimated as the number of 
spline knots. The fully Bayesian framework using McMC is hence the better 
approach, using for example BayesX specifically devoted to StAR models or, 
with more difficulty, BUGS. In our case using random walks as prior on effects 
was not an important limitation as it was easy and rational to round or cate- 
gorize our variables. Of course, outside the latent Gaussian model framework 
needed for INLA and in addition to the ICAR prior for spatial effect, relying 
on adjacency betw een geographical units , sever al other model can be used for 
spatial smoothing (Kammann a nd Wand . l2003h . fo r example bid imensional P- 
splines ( Lang and BrezgerT 20041) (or more generallv lWoodl ( 20061 )) can be fitted 
on the centroids of the geographical units, as implemented in BayesX. 

We consider here that the number of people at risk is the population of a ge- 
ographical unit. But we could also apply to this population a factor representing 
the proportion of the population that can be recruited. This " hospitalizability" 
is different according to the concerned pathology: e.g. 20% of the total popula- 
tion or 30% of men over 75 years, based on the prevalence of diabetes. Rather 
than consider expected number as a fixed percentage of the population, it is 
logical to try to adjust this percentage, for example, by age. This echoes the 
traditional techniques of standardization of risk used, for example, in disease 
mapping. 

Eventuallv lRue et al.l (|2009l ) described two useful methods for approximating 
ir(xi\0,y) in the equation ([2]). In this paper we describe and use a simplified 
Laplace approximation but two Laplace approximations in the equation (j4]) can 
be used instead of. The accuracy of the simplified Laplace approximation can be 
not good enough for the computation of predictive measures (like conditional 
predictive ordinate or cross- validated probability inte gral transform) and the 
full Laplace approximation has sometimes to be used (I Held and Rud . l2010l ). 

An alternative way, besides splines or random walks for modeling the atten- 
uation of the recruitment with distance d, can be to use a generalization of the 
Reilly distribution. Recruitment can then vary with ^ , where p is a parameter 
to be estimated. In the case of binomial distribution for the response variable, 
we need to transform ^ on the logit scale and hence the function to be included 
in the models is — ln(d p — 1). In a Bayesian framework we would assume a vague 
prior distribution on p, for example a uniform on to 5. We tested the use of 
this generalized-Reilly distribution in our application (unpublished manuscript) 
but it seems that these models are not flexible enough compared with smooth- 
ing by random walks. On the other hand, Reilly is parametric function and an 
estimation of the function parameter can be retrieved from the model and eas- 
ily interpreted. An important limitation of the approach using aggregated data 
and Poisson or binomial models is that an observation in the dataset is a set of 
covariates related to the geographical unit and potentially also to some other 
variables which need to be categorized. For example, in our application if we are 
interested in the recruitment taking into account the age of patients, we have 
to categorize the age and count the number of patients in all the combinations 
of geographical units and age catego ries. A powerful approach is then to use , 
for example, Poisson-kriging models ( Goovaerts . 20061 Goovaerts and Gebreab . 
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20081 ). The attenuation of the recruitment according to the distance or access 



time, can be isotropic (the recruitment is hence the same on all points of a 
circle around the healthcare provider) but can also be anisotropic. For exam- 
ple, we can assume that the distance effect will not be the same in all of the 
proximity zones, Z , and build a varying coefficient model including some terms 
I(i £ Zk) ■ f(di), where f(di) can be modeled using random walks. 
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